require(xlsx)
library("forecast")
library("fpp")
library("fpp2")
library("gridExtra")
library("ggplot2")
library(plotly)
time<-read.xlsx("data_g10.xlsx",sheetName = "6.41 r")
Calculate frequency of the time series, we need it to transform the table to a time series object Then plot the series
freq<-100
time.ts <- ts(time[,2],frequency = freq)
plot_ly(x = time$Fecha, y = time$Tipo, mode = 'lines')
-Nacho and Panos : “Wow so beautiful and nice,good job Mike, thanks for that plot”
-Mike : “Fuck Olympiakos”
lets move on.
Lets see if our data change with logarithmic tranformations.
plot_ly(x = time$Fecha, y = log(time$Tipo), mode = 'lines')
-Nacho and Panos: “They look the same!”
-Mike : “Thats right boyzzz”
-Javier : “less plots for shiny then!”
-Dani : “and for the report!”
lets answer some questions now.
Do our data have seasonality?what kind of correlations do we have?
#seasonality
ggmonthplot(time.ts)
ggseasonplot(time.ts)
# Exploring correlations of lagged observations
gglagplot(time.ts, lag=9, do.lines=FALSE)
Comments:
Non stationary
Trend: the measures tend to decrease over time = downward trend
Seasonality : we cannot see any seasonality on the measures,they have no pattern on highs and lows
Cyclicality : we can see that there might be a cyclicality on the measures as there are highs that apear over time
needed for the next step: logarithmic transformation makes no difference so we wont need Multiplicative decomposition
Thats it for Question1
Lets see Question2:
We can have a lot types of decompositions.Here i am having 3 of them: In the actuall file you can see them,here i dont plot them because they are huge plots.
timdec1<-decompose(time.ts)#simple
timdec2<-stl(time.ts,freq,t.window=15, robust=TRUE)#t.window controls wiggliness of trend component.
timdec3<-stl(time.ts,freq,s.window="periodic", robust=TRUE) #we keep the seasonal component identical across years
pd1<-plot(timdec1)
pd2<-plot(timdec2)
pd3<-plot(timdec3)
Now its time to forecast correctly,we can select which dicomposition we like. Method:ETS
fcst=forecast(timdec2, method="ets", h=24)
plot(fcst)
fcst$mean
## Time Series:
## Start = c(3, 26)
## End = c(3, 49)
## Frequency = 100
## [1] 3.107201 3.026959 3.057222 3.035247 3.039892 3.011511 3.090908
## [8] 3.243595 3.327602 3.365985 3.391159 3.464175 3.516374 3.501029
## [15] 3.507341 3.545001 3.592090 3.613119 3.654668 3.727423 3.698898
## [22] 3.685927 3.616223 3.630654
Method:Arima
fcst=forecast(timdec3, method="arima", h=24)
plot(fcst)
fcst$mean
## Time Series:
## Start = c(3, 26)
## End = c(3, 49)
## Frequency = 100
## [1] 2.773071 2.675505 2.697414 2.668790 2.676774 2.655042 2.748253
## [8] 2.916933 3.024319 3.086736 3.145039 3.254617 3.347737 3.379109
## [15] 3.436718 3.527957 3.633884 3.715038 3.814759 3.948929 3.993523
## [22] 4.051831 4.068381 4.155433
Lets check the differences,i dont know which plot represents our data better,so i put the ones i found(there are 2 more plots in the time.r).
ACF1
adjusted_diffts <- time.ts - timdec1$seasonal
acf(adjusted_diffts, lag.max = 12, type = c("correlation", "covariance", "partial"), plot = TRUE, na.action = na.contiguous, demean = TRUE)
ACF2
ggAcf(adjusted_diffts)
PACF
Pacf(adjusted_diffts, lag.max = 12, plot = TRUE, na.action = na.contiguous, demean = TRUE)
i was not able to find the remainder part,i dont know what that is so…But none of those stuff look like white noise so thats sweeettt
A)We used both plot and tsdisplay in the time.r file and we saw that there is no difference,so we stick to the original data. In the time.r i have used some commands that might be useful for the rest questions(B and C probably,check it out before you start).
We can interpret almost everything from the plots,but for some parts i will need your help. Until then,merry Christmass Data Heretics!!!